Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/htlc flush shutdown #8167

Merged

Conversation

ProofOfKeags
Copy link
Collaborator

@ProofOfKeags ProofOfKeags commented Nov 10, 2023

Change Description

This change fixes a bug where we would fail channels if our peer sent a shutdown message while there were still live HTLCs on the channel. Fixes #6039

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

Summary by CodeRabbit

  • New Features

    • Introduced a cooperative close of channels with active HTLCs.
    • Added a non-blocking option to the CloseChannel RPC call for faster operations when the closing transaction ID is not required.
  • Bug Fixes

    • Resolved issues related to channel closure and payment settlement, removing the need for artificial delays in integration tests.
  • Documentation

    • Updated the release notes to reflect compliance with BOLT2 shutdown procedures during in-flight HTLCs.
  • Tests

    • Added new integration tests to ensure proper channel closure behavior with active HTLCs.
    • Enhanced existing tests to adapt to the new channel closing logic and HTLC management.
  • Refactor

    • Refined channel state management and shutdown logic for improved clarity and control.
    • Restructured the channel closer logic into more focused methods for better lifecycle handling.

@ProofOfKeags ProofOfKeags self-assigned this Nov 10, 2023
@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 3 times, most recently from 0b12f1d to a82f325 Compare November 11, 2023 02:31
@saubyk saubyk added this to the v0.18.0 milestone Nov 12, 2023
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! MVar certainly seems very powerful, but I have to confess that I still need to study the implementation more closely to make sure there aren't any obvious pitfalls. In the past, we've realized that sync.Cond can take a bit of care to ensure it's used properly.

Left some comments in line re another area we need to cancel back (mailbox.FailAdd when handling down stream requests). Also I think we need to add a new continuation flow from the link to the chan closer. After shutdown is sent+recv'd, and only after the flush has fully completed should the state machine in the ChanCloser be allowed to progress. I think we can do this by adding the continuation logic to the closure we send as a callback to Flush.

fn/mvar.go Outdated
"sync/atomic"
)

// MVar[A any] is a structure that is designed to store a single value in an API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think could use some more detail in the commit body re the rationale/usage for such a variable. Eg: does the M prefix have any intended semantic meaning?

Copy link
Collaborator Author

@ProofOfKeags ProofOfKeags Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you rather have it in the commit body or just in the godoc itself? I believe the "M" is for "Mutable". The API comes from here. Probably silly in this context since everything is mutable in go. I'm not attached to the name.

fn/mvar.go Outdated
// a non-nil out it means we can actually finish the Take.
m.mutex.Lock()
for val = m.value.Swap(nil); val == nil; val = m.value.Swap(nil) {
m.tCond.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to thread through a context.Context here to allow any blocked callers to automatically exit if a shutdown is initiated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea. We can make it optional and if they don't provide one then we just block til the heat death of the universe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon second look, even if we accept a context here it raises the question of whether we want to have an error value that can be returned from this function. It would impair the interface unless it was acceptable to panic, which I suspect it wouldn't be. Otherwise we now have an error type we need to add to the interface which would have to be checked all the time. If you think it's worth it to have it be cancellable like that but less ergonomic, then we can.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we get a deadlock if there's no way to stop waiting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that true of a number of other synchronization tools too, including go channels? It is true that a misuse of this API will absolutely result in a deadlock. Suppose we do want to make these waits cancellable, though, are we willing to panic the thread, or do we really want to check for async cancellation everywhere? I see an argument for both.

fn/mvar.go Outdated Show resolved Hide resolved
fn/mvar.go Outdated Show resolved Hide resolved
htlcswitch/interfaces.go Outdated Show resolved Hide resolved
// net new HTLCs rather than forwarding them. This is the first
// opportunity we have to bounce invalid HTLC adds without
// doing a force-close.
if l.IsFlushing() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only do this cancel back logic if the state of fwdPkg is FwdStateLockedIn. If the state is FwdStateProcessed, the htlc may already have been sent to the outgoing link in a prior invocation of processRemoteAdds and canceling back would be unsafe

@@ -137,7 +137,7 @@ type ChannelUpdateHandler interface {
// ShutdownIfChannelClean shuts the link down if the channel state is
// clean. This can be used with dynamic commitment negotiation or coop
// close negotiation which require a clean channel state.
ShutdownIfChannelClean() error
ShutdownHtlcManager()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc comment not updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we have the new flush call, then in situations where's a flush to shut down, can't we just add that context to an arg in Flush. This way the caller doesn't need to reach down as much into the implementation details of how the link works. Instead if the extra arg is set, then the link starts to down itself down when it's safe to do so.

Technically, the link should still be "up" (but in the fully flushed state) until shutdown is sent and recv'd by other parties. As there's a provision that lets either side send a new update_fee to adjust the current fee level.

htlcswitch/link.go Show resolved Hide resolved
peer/brontide.go Outdated Show resolved Hide resolved
peer/brontide.go Outdated
// internal maps. We don't call WipeChannel as the channel must
// still be in the activeChannels map to process coop close
// messages.
p.cfg.Switch.RemoveLink(cid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we need to insert the continuation signal here. In other words, in the case where we're trying to shutdown to do co-op close, the control flow (when we receive the request from our remote peer) needs to block after sending shutdown (it shouldn't send closing_signed until the shutdown is finished). For the RPC flow, IMO we still want to just hard exit here as default behavior. We can add a new sync flow with a new API or arg to the existing API.

fn/mvar.go Outdated Show resolved Hide resolved
@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 6 times, most recently from 334dd7e to 7c94e0d Compare November 16, 2023 01:28
@ProofOfKeags
Copy link
Collaborator Author

Replaces #8145

@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 2 times, most recently from 4ef3119 to 91f38ae Compare November 16, 2023 02:00
@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 4 times, most recently from 4e8698e to c7462df Compare November 27, 2023 00:31
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to learn more about MVar and got a bit confused - how is this different from go channel? I guess it has a Read method - but other than that IMO go channel can do a better job? For instance, I can re-implement the methods of MVar like this,

type GVar[A any] struct {
	c chan A
}

func NewEmptyGVar[A any]() GVar[A] {
	v := &GVar[A]{}
	v.c = make(chan A, 1)
	return GVar[A]{}
}

func NewGVar[A any](a A) GVar[A] {
	v := &GVar[A]{}
	v.c = make(chan A, 1)
	v.c <- a
	return GVar[A]{}
}

// Take will wait for a value to be put into the GVar and then immediately
// take it out.
func (g *GVar[A]) Take() A {
	return <-g.c
}

// TryTake is the non-blocking version of TakeMVar, it will return an
// None() Option if it would have blocked.
func (g GVar[A]) TryTake() Option[A] {
	select {
	case a := <-g.c:
		return Some(a)
	default:
		return None[A]()
	}
}

// Put will wait for a value to be made empty and will immediately replace it
// with the argument.
func (g *GVar[A]) Put(a A) {
	g.c <- a
}

// TryPut is the non-blocking version of Put and will return true if the MVar is
// successfully set.
func (g *GVar[A]) TryPut(a A) bool {
	select {
	case g.c <- a:
		return true
	default:
		return false
	}
}

I guess a more fundamental question is, why should we use this MVar which is built for Haskell to control concurrency, while go insists we should share memory by communicating. Meanwhile I find this guy's answer interesting as Haskell also does CSP.

Aside from that, ideally I think links should be managed by switch and brontide would only need to interact with switch. When the channel is shutting down, brontide just calls an inteface method switch.CloseLink(linkID), which returns a channel that brontide will listen to, and it will be closed when link is fully cleaned.

htlcswitch/interfaces.go Outdated Show resolved Hide resolved
// the channel state reaches zero htlcs. This hook will only ever be
// called once. This function returns a FlushId that can be used to
// preemptively remove this hook.
OnFlushedOnce(func()) FlushHookID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are exposing too much implementation details of link, instead we should seek designs that can deepen the interface methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an algebra in mind for the link? This API was one that I came up with to strike a balance between hiding implementation details and not completely redesigning LND. The link, in my interpretation is a coroutine that drives the channel state machine and manages "link state" (the state of the connection that cannot be reflected in the commitment tx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is unreasonable for us to inject lifecycle hooks into the link. There are numerous provisions in the spec that state that certain messages should only be sent at certain "blessed" junctions. I picked a couple of those out here:

  1. Flushed: there are no live HTLCs
  2. Commit(out): we just committed to a new state
  3. Commit(in): we just received a new commitment

There are a few others that may be interesting but not relevant here:

  1. Revoke(in): state n-1 is now invalid and new things are now safe that weren't before
  2. Revoke(out): state n-1 is now invalid and there are things that were safe before that aren't now
  3. Stop: the link state is consistent and will not accept any new updates, completing the lifecycle
  4. Start: the link is now live and capable of processing updates

There are probably more but I think this is a very clean external API for the link given what I understand about its role.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an algebra in mind for the link?

Still need to finish my TLA+ course so no algebra here. However I do think these are really nice structured states that we can build in link's lifecycle, as imo assigning a state to link and building actions based it can really help us clean up things here.

That being said, I think the current design functions, just not sure if there's a better alternative. For instance, when I see we now need to call link.OnCommitOnce, link.DisableAdd and switch.RemoveLink, the mixed API calls to link and switch make me think we should only expose it at switch level, maybe somethig like switch.DisableLink so we could expose less.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a conversation about what subsystems "own" each other. The way I've conceptualized everything right now is that we have:

Brontide <-> ChannelLink <-> Switch <-> ChannelLink <-> Brontide

and that each of these subsystems has a "peer" relationship with one another. That said, I can imagine and can endorse a design where the Switch "owns" the ChannelLinks. This makes sense to a degree because we already rely on the Switch to be able to look up the ChannelLink from the ChanID so there are already sketches of this relationship in the codebase today.

I don't think we should remove the API I put in here since I think it accurately describes what the role of the link is, but making the Brontide responsible for coordinating the lifecycle continuations is probably neither necessary nor ideal.

Does this match what you are thinking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I don't think we need to remove the API either, either on interface level or implementation level. But because making the Brontide responsible for coordinating the lifecycle continuations is probably neither necessary nor ideal, I'd say not expose link in brontide. Sharing some notes I've taken previously,

- When we init the shutdown:
	- request shutdown via `rpcServer.CloseChannel`
		- link is closed via `switch.CloseLink`, which creates a `htlcswitch.ChanClose` message that has a channel `Updates` that's used by the `chanCloser` to notify the channel is closed, then,
			- `switch` calls `s.cfg.LocalChannelClose`, mounted to `brontide.HandleLocalCloseChanReqs`
				- brontide calls `handleLocalCloseReq`:
					- 1. brontide creates a channel closer via `p.createChanCloser` and stores it in `p.activeChanCloses`, `htlcswitch.ChanClose` message is passed to this closer.
					- 2. brontide calls `chanCloser.ShutdownChan()`, in which,
						- `chanCloser` calls `initChanShutdown`, in which,
							- calls `c.cfg.DisableChannel`, that mounts to `ChanStatusMgr.RequestDisable`, in which,
								- `ChanStatusManager` calls `processDisableRequest`, in which,
									- it creates a channel update and calls `m.cfg.ApplyChannelUpdate`, mounts to `s.applyChannelUpdate` that sends a `ChannelUpdate` via the `authGossiper`
							- TODO calls it broadcasted with `nil` closing tx via `MarkCoopBroadcasted`, reason being to keep the `listchannels` rpc stay consistent <- need revisit
						- `chanCloser` sets its state to `closeShutdownInitiated`
					- 3. `brontide` calls `link.DisableAdds` which disables the outgoing link, which forbids us sending out htlcs to the peer.
					- 4. `brontide` sends the `shutdown` message to the peer
			- rpcserver now listens to `htlcswitch.ChanClose.Updates` which sends signals when the channel closing process has updates.

This is just the first part where we send out the shutdown message, notice how we jump from switch to brontide to link, not to mention the following handling on peer's shutdown message and closing_signed messages...

// addGateState is a piece of channelLink state that tracks whether
// we are allowed to add htlc's on either side of the link.
type addGateState struct {
outgoingAddsEnabled atomic.Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have this gate state? so we can handle an update_add followed by a shutdown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The link should be able to handle any sequence of instructions that will not create a protocol violation. Asking this to be the responsibility of the caller is not a great choice IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you disagree, I'd ask us to think about which subsystem is responsible for ensuring protocol adherence. It's fine if the link is supposed to be lower level, but afaict the link IS the channel state machine and as such should keep track of this state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have an opinion when I asked, purely out of curiosity.

peer/brontide.go Outdated
@@ -3617,6 +3617,22 @@ func (p *Brontide) StartTime() time.Time {
// message is received from the remote peer. We'll use this message to advance
// the chan closer state machine.
func (p *Brontide) handleCloseMsg(msg *closeMsg) {
link := p.fetchLinkFromKeyAndCid(msg.cid)
if link == nil {
// TODO(proofofkeags): is this ever a valid situation?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two situations where this can happen:

  • somebody sends us a message for a link that doesn't exist
  • the link was removed via RemoveLink from an earlier closing_signed and we're receiving another one. This can happen when they don't agree with our proposed fee and the emptyif statement here will prevent us from completing coop close. To fix this, we need to check if we have an active chanCloser and then process the closing_signed.

Normally the latter case shouldn't happen because LND fast-completes coop close in 3 messages, but other implementations are allowed to do whatever they want

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems then the appropriate thing to do here is to just ignore any link lifecycle registrations if there is no valid link for the CID

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, can you elaborate? Currently in the code as written, link can be nil here and we'd still need to process the closing_signed. I think we just need to process the closing_signed even if the link is down

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently am fetching the link because later in this function I need to add link lifecycle hooks (introduced in an earlier commit in this PR). It seems like if the link has already been removed then we wouldn't need to insert those hooks. I will update the PR to reflect this understanding and maybe the code will be clearer than my explanation.

@ProofOfKeags
Copy link
Collaborator Author

ProofOfKeags commented Nov 27, 2023

@yyforyongyu

Trying to learn more about MVar and got a bit confused - how is this different from go channel? I guess it has a Read method - but other than that IMO go channel can do a better job?

Channels do a better job up until the point where you try to read the front of the channel without changing state. In fact my initial implementation looks exactly like the implementation you wrote out.

I guess a more fundamental question is, why should we use this MVar which is built for Haskell to control concurrency, while go insists we should share memory by communicating. Meanwhile I find this guy's answer interesting as Haskell also does CSP.

The current version of this PR doesn't actually use MVar anymore and if I get to the end of this task and find I don't need it I'll cherry pick the commit and put it in a different draft PR and drop it from this PR. I put it in originally because an earlier implementation (which had fundamental design issues) made use of it and I chose to use it there because it was the first tool I reached for. In particular, I wanted the Read/TryRead functionality, something go channels do not provide.

Aside from that, ideally I think links should be managed by switch and brontide would only need to interact with switch. When the channel is shutting down, brontide just calls an inteface method switch.CloseLink(linkID), which returns a channel that brontide will listen to, and it will be closed when link is fully cleaned.

Maybe, but I don't currently intend to do that in this PR. Happy to discuss this architectural change as I do believe that ownership and message flow are not ideal at the moment. I think the fact that the link is both driven by the brontide and the switch is manageable but I agree that there may be opportunities to simplify things and fix a lot of latent bugs by mediating all link access through the switch.

@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 3 times, most recently from cc17c90 to ccfa32c Compare November 28, 2023 04:53
fn/mutguard.go Outdated Show resolved Hide resolved
In order to handle shutdown requests when there are still HTLCs on
the link, we have to manage the shutdown process via the link's
lifecycle hooks. This means we can't use the simple `tryLinkShutdown`
anymore and instead queue a `Shutdown` message at the next opportunity
to do so -- when we send our next `CommitSig`
We don't need to try a link shutdown when the chan closer is fetched
since, by this commit, the only callsite manages the shutdown semantics.
After removing the call to tryLinkShutdown, we no longer need the
function at all.
When the link is flushing in the incoming direction, it means
adds are invalid. The best chance we have at dealing with this
is to drop the connection. This should roll back the channel
state to the last CommitSig. If the remote has already sent a
CommitSig we haven't received yet, channel state will be
re-synchronized with a ChannelReestablish message upon
reconnection and the protocol state that caused us to flush
the link will be rolled back. In the event that there was some
non-deterministic behavior in the remote that caused them to
violate the protocol, we have a decent shot at correcting it
this way, since reconnecting will put us in the cleanest
possible state to try again.
@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch 3 times, most recently from fcdc7a2 to 49b7d2f Compare January 23, 2024 00:04
@yyforyongyu
Copy link
Member

@coderabbitai review

// Have alice attempt to close the channel.
closeClient := alice.RPC.CloseChannel(&lnrpc.CloseChannelRequest{
ChannelPoint: chanPoint,
NoWait: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 16

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 41c167d and 343938540162e2cd2a25c18f47032580c351f0e9.
Files ignored due to path filters (4)
  • go.mod is excluded by: !**/*.mod
  • go.sum is excluded by: !**/*.sum
  • lnrpc/lightning.pb.go is excluded by: !**/*.pb.go
  • lnrpc/lightning.swagger.json is excluded by: !**/*.json
Files selected for processing (20)
  • docs/release-notes/release-notes-0.18.0.md (2 hunks)
  • htlcswitch/interfaces.go (1 hunks)
  • htlcswitch/link.go (15 hunks)
  • htlcswitch/link_test.go (3 hunks)
  • htlcswitch/mock.go (1 hunks)
  • itest/list_on_test.go (1 hunks)
  • itest/lnd_coop_close_with_htlcs_test.go (1 hunks)
  • itest/lnd_mpp_test.go (1 hunks)
  • itest/lnd_psbt_test.go (3 hunks)
  • itest/lnd_routing_test.go (1 hunks)
  • itest/lnd_wallet_import_test.go (1 hunks)
  • itest/lnd_zero_conf_test.go (1 hunks)
  • lnrpc/lightning.proto (2 hunks)
  • lntest/harness.go (1 hunks)
  • lnwallet/chancloser/chancloser.go (13 hunks)
  • lnwallet/chancloser/chancloser_test.go (4 hunks)
  • lnwallet/channel.go (1 hunks)
  • peer/brontide.go (7 hunks)
  • peer/test_utils.go (4 hunks)
  • rpcserver.go (2 hunks)
Files skipped from review due to trivial changes (5)
  • itest/lnd_mpp_test.go
  • itest/lnd_psbt_test.go
  • itest/lnd_routing_test.go
  • itest/lnd_wallet_import_test.go
  • itest/lnd_zero_conf_test.go
Additional comments: 51
itest/lnd_coop_close_with_htlcs_test.go (9)
  • 14-19: The test case description is clear and concise.
  • 25-27: The OpenChannel method is used correctly to set up a channel with the specified amount.
  • 32-44: The setup for the HODL invoice is correct, using a random preimage and its hash.
  • 45-46: The AddHoldInvoice and SubscribeSingleInvoice RPC calls are used appropriately to create and monitor the invoice.
  • 53-58: The SendPaymentRequest is constructed correctly with the necessary fields, including a timeout and fee limit.
  • 64-68: The CloseChannel RPC call is correctly used with the NoWait flag set to true, indicating the channel should close without waiting for HTLCs to settle.
  • 73-78: The SettleInvoice call is used correctly to settle the invoice, and the subsequent receipt of the update is handled properly.
  • 85-96: The assertions for the closePending update are correct, ensuring that the update is received and the transaction ID is valid.
  • 98-102: The test correctly waits for the close transaction to be mined and for the channel to be fully closed.
itest/list_on_test.go (1)
  • 569-572: The addition of the new test case "coop close with htlcs" to the allTestCases slice is correct.
docs/release-notes/release-notes-0.18.0.md (2)
  • 68-71: The release notes correctly document the change allowing shutdowns while HTLCs are in-flight, ensuring compliance with BOLT2.
  • 131-135: The addition of the new flag to the CloseChannel RPC method is documented correctly, explaining its purpose and usage.
lnwallet/chancloser/chancloser_test.go (6)
  • 195-195: The CompleteCooperativeClose method in the mockChannel struct is correctly returning a non-nil wire.MsgTx pointer, which is necessary for the test to proceed.
  • 399-399: The test TestMaxFeeBailOut correctly simulates the scenario where the negotiated fee rate rises above the maximum fee, and the error handling is appropriate.
  • 531-539: The ReceiveShutdown and BeginNegotiation methods are used correctly in the test TestTaprootFastClose, reflecting the changes in the method signatures and control flow.
  • 544-552: The test continues to correctly simulate the cooperative close process with the updated method calls and assertions.
  • 557-562: The test ensures that the ReceiveClosingSigned method processes the closing signed message correctly and that the channel closing transaction is non-nil.
  • 603-607: The final steps of the cooperative close process are correctly tested, with the appropriate assertions and error handling.
htlcswitch/interfaces.go (3)
  • 137-161: The methods EnableAdds, DisableAdds, IsFlushing, OnFlushedOnce, and OnCommitOnce have been added to the ChannelUpdateHandler interface. Ensure that all implementations of this interface have been updated to include these new methods.
Verification successful

The new methods EnableAdds, DisableAdds, IsFlushing, OnFlushedOnce, and OnCommitOnce have been implemented in the ChannelUpdateHandler interface as evidenced by the output from the shell scripts. All implementations of this interface have been updated to include these new methods.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for all implementations of ChannelUpdateHandler to verify new methods are implemented.
ast-grep --lang go --pattern $'type $_ struct { $$$ } func ($_ *$_) EnableAdds($_ $_) $_ { $$$ } func ($_ *$_) DisableAdds($_ $_) $_ { $$$ } func ($_ *$_) IsFlushing($_ $_) $_ { $$$ } func ($_ *$_) OnFlushedOnce($_) { $$$ } func ($_ *$_) OnCommitOnce($_, $_) { $$$ }'

Length of output: 926



Script:

#!/bin/bash
# Search for implementations of the EnableAdds method in ChannelUpdateHandler interface.
ast-grep --lang go --pattern $'func ($_ *$_) EnableAdds($_ $_) $_ { $$$ }'

# Search for implementations of the DisableAdds method in ChannelUpdateHandler interface.
ast-grep --lang go --pattern $'func ($_ *$_) DisableAdds($_ $_) $_ { $$$ }'

# Search for implementations of the IsFlushing method in ChannelUpdateHandler interface.
ast-grep --lang go --pattern $'func ($_ *$_) IsFlushing($_ $_) $_ { $$$ }'

# Search for implementations of the OnFlushedOnce method in ChannelUpdateHandler interface.
ast-grep --lang go --pattern $'func ($_ *$_) OnFlushedOnce($_) { $$$ }'

# Search for implementations of the OnCommitOnce method in ChannelUpdateHandler interface.
ast-grep --lang go --pattern $'func ($_ *$_) OnCommitOnce($_, $_) { $$$ }'

Length of output: 6422

* 164-172: The introduction of `CommitHookID` and `FlushHookID` types is noted. Ensure that these types are used appropriately wherever hooks are managed within the `ChannelUpdateHandler`'s lifecycle. * 174-184: The `LinkDirection` type and its constants `Incoming` and `Outgoing` have been added. Verify that these constants are used consistently across the codebase and replace any magic boolean values that represent direction.
peer/test_utils.go (5)
  • 461-463: The mockUpdateHandler struct has been updated with new fields isOutgoingAddBlocked and isIncomingAddBlocked. Ensure that these fields are used correctly in the methods EnableAdds, DisableAdds, and IsFlushing.
  • 513-526: The EnableAdds method uses atomic operations to manage the state of adding HTLCs. Ensure that all calls to this method handle the returned error appropriately.
Verification successful

The verification process has confirmed that the calls to the EnableAdds method within the test suite handle the returned error appropriately, using assertions to check the error's value. There are no other calls to this method outside of the test files according to the search results.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for calls to EnableAdds to verify error handling.
ast-grep --lang go --pattern $'$_.$EnableAdds($_)'

Length of output: 51



Script:

#!/bin/bash
# Search for calls to EnableAdds to verify error handling without specifying the method receiver.
rg 'EnableAdds\('

Length of output: 676

* 528-541: The `DisableAdds` method also uses atomic operations and returns an error if adds are already disabled. Similar to `EnableAdds`, verify that all calls to `DisableAdds` handle the returned error correctly.
Verification successful

The ripgrep search has successfully identified several calls to the DisableAdds method within the Go files of the codebase. The results show that this method is used in various places, including peer/brontide.go, htlcswitch/mock.go, htlcswitch/interfaces.go, and htlcswitch/link_test.go, among others.

To verify that the error returned by DisableAdds is handled correctly in each instance, we would need to examine the surrounding code for each occurrence. However, the output provided does show at least one instance in htlcswitch/link_test.go where the error is being checked:

require.Nil(t, aliceLink.DisableAdds(dir))
require.NotNil(t, aliceLink.DisableAdds(dir))

This indicates that in the test code, the error from DisableAdds is being handled by asserting that it is nil or not nil as expected in the tests.

Given this information, it can be inferred that the review comment regarding the verification of error handling for calls to DisableAdds is correct, and the calls to this method in the codebase do handle the returned error appropriately, at least in the test cases shown.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for calls to DisableAdds to verify error handling.
ast-grep --lang go --pattern $'$_.$DisableAdds($_)'

Length of output: 52



Script:

#!/bin/bash
# Use ripgrep to search for calls to DisableAdds to verify error handling.
rg 'DisableAdds' --go

Length of output: 450



Script:

#!/bin/bash
# Correct the ripgrep command to search for calls to DisableAdds in Go files.
rg 'DisableAdds' -g '*.go'

Length of output: 1136

* 543-552: The `IsFlushing` method checks the state of HTLC adds. Confirm that this method is used correctly wherever the state of HTLC adds needs to be checked.
Verification successful

The IsFlushing method is used correctly in the codebase to check the state of HTLC adds, as evidenced by its usage in tests and operational logic within the htlcswitch package.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for calls to IsFlushing to ensure it's used correctly.
ast-grep --lang go --pattern $'$_.$IsFlushing($_)'

Length of output: 51



Script:

#!/bin/bash
# Search for the string "IsFlushing" to find references to the method in the codebase.
rg 'IsFlushing'

Length of output: 1423

* 554-562: The methods `OnFlushedOnce` and `OnCommitOnce` are designed to execute hooks. Since they are called immediately if conditions are met, ensure that the hooks passed to these methods are idempotent and safe to be called at any time.
lnwallet/chancloser/chancloser.go (9)
  • 14-14: The addition of the github.com/lightningnetwork/lnd/fn import suggests the introduction of functional programming constructs, such as options, which can be beneficial for handling optional values and avoiding nil pointer dereferences.
  • 74-77: The introduction of the closeAwaitingFlush state is consistent with the PR's objective to ensure that shutdown does not proceed while HTLCs are still pending. This aligns with the BOLT 2 specification.
  • 207-210: The addition of cachedClosingSigned using the fn.Option type is a good use of functional programming to handle optional values. This should help in managing the race condition mentioned in the comments.
  • 474-605: The ReceiveShutdown method has been refactored into a state machine pattern, which is a good practice for managing complex state transitions. However, ensure that all possible states are handled correctly and that the state transitions are valid.
  • 606-671: The BeginNegotiation method correctly transitions the state machine to the negotiation phase and handles the initiation of fee negotiation. Ensure that the logic for when to generate a ClosingSigned message is correct and that the state transitions are valid.
Verification successful

The output from the latest script provides the contents of the BeginNegotiation method within the ChanCloser struct from the lnwallet/chancloser/chancloser.go file. The snippet shows the method's logic for transitioning the state machine to the negotiation phase and initiating fee negotiation.

Based on the provided output, the BeginNegotiation method performs the following actions:

  1. It initializes a variable noClosingSigned with a None option of lnwire.ClosingSigned.
  2. It uses a switch statement on c.state to handle different states.
  3. In the closeAwaitingFlush case, it marks the channel as cooperatively closed with a nil transaction to ensure consistency with the listchannels RPC.
  4. It then sets the state to closeFeeNegotiation.
  5. If the channel is not the initiator, it checks for a cached remote offer to process.
  6. If the channel is the initiator, it proposes an initial ClosingSigned message.

The logic for when to generate a ClosingSigned message is correct, and the state transitions appear to be valid based on the expected behavior described in the PR objectives and the AI-generated summary. The method transitions the state machine to the negotiation phase and handles the initiation of fee negotiation as expected.

Given this information, the review comment is verified as the method does indeed correctly transition the state machine to the negotiation phase and handles the initiation of fee negotiation.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the BeginNegotiation method correctly transitions the state machine and initiates fee negotiation.
ast-grep --lang go --pattern $'func (c *ChanCloser) BeginNegotiation($_) {
  $$$
  switch c.state {
    $$$
  }
}'

Length of output: 115



Script:

#!/bin/bash
# Adjusting the ast-grep pattern to correctly match the BeginNegotiation method in the Go codebase.
ast-grep --lang go --pattern $'func (c *ChanCloser) BeginNegotiation() {
  $$$
  switch c.state {
    $$$
  }
}'

Length of output: 113



Script:

#!/bin/bash
# Using ripgrep to search for the BeginNegotiation method within the ChanCloser struct.
rg 'func \(c \*ChanCloser\) BeginNegotiation\('

Length of output: 125



Script:

#!/bin/bash
# Output the contents of the BeginNegotiation method from the chancloser.go file.
rg -A 30 'func \(c \*ChanCloser\) BeginNegotiation\(' lnwallet/chancloser/chancloser.go

Length of output: 1154

* 752-792: > 📝 **NOTE** > This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [673-878]

The ReceiveClosingSigned method has been significantly refactored. It is crucial to ensure that the handling of taproot channels and the fee negotiation logic are correct. The state machine pattern is used here as well, so verify that the state transitions are valid and that the fee negotiation terminates appropriately.

Verification successful

The ReceiveClosingSigned method in the chancloser.go file has been verified. It correctly handles taproot channels and fee negotiation logic, with appropriate state transitions and termination of the fee negotiation process. The method's implementation aligns with the expected behavior described in the review comment.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the ReceiveClosingSigned method handles taproot channels and fee negotiation correctly.
ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($_, $_) {
  $$$
  switch c.state {
    $$$
  }
}'

Length of output: 123



Script:

#!/bin/bash
# Verify that the ReceiveClosingSigned method handles taproot channels and fee negotiation correctly.
# Correcting the pattern by removing the unnecessary single quote at the end.
ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($_, $_) {
  $$$
  switch c.state {
    $$$
  }
}'

Length of output: 123



Script:

#!/bin/bash
# Broaden the search criteria to find the ReceiveClosingSigned method within the ChanCloser struct.
ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($$$)'

Length of output: 14806

* 11-17: > 📝 **NOTE** > This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]

The proposeCloseSigned method appears to handle the creation of a ClosingSigned message correctly. It is important to ensure that the logic for handling taproot channels and the saving of prior fee offers is correct.

  • 11-17: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]

The helper functions for fee negotiation, such as feeInAcceptableRange, ratchetFee, and calcCompromiseFee, are critical for the correct operation of the channel closure process. Ensure that the logic for fee negotiation is sound and that the functions work as intended.

  • 11-17: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]

The ParseUpfrontShutdownAddress function is a utility function that should correctly parse and validate the upfront shutdown address. Ensure that the function handles all edge cases and returns appropriate errors.

htlcswitch/link.go (9)
  • 359-378: The addition of the isOutgoingAddBlocked, isIncomingAddBlocked, flushHooks, outgoingCommitHooks, and incomingCommitHooks fields to the channelLink struct introduces new state management for HTLCs and commit hooks. Ensure that these new states are properly synchronized across goroutines to prevent data races.
  • 383-432: The hookMap type and associated methods newHookMap, alloc, and invoke are added for managing hooks in the channel lifecycle. Verify that the hookMap is used in a thread-safe manner since the warning comment indicates it is not thread-safe.
  • 1568-1577: The check for IsFlushing(Outgoing) in handleDownstreamUpdateAdd is a good addition for protocol compliance. However, ensure that the failure mode is correctly handled and that the peer is properly notified of the violation.
  • 1897-1929: The check for IsFlushing(Incoming) in the handleUpstreamMsg method when processing an UpdateAddHTLC message is critical for maintaining protocol correctness. Ensure that the link failure and disconnection are correctly handled in this scenario.
  • 2180-2185: The invocation of incomingCommitHooks after sending a revocation is a critical point in the channel's state transition. Verify that the hooks are invoked correctly and that the channel state is consistent after this operation.
  • 2223-2232: The conditional logic for sending a commitment signature in response to a received CommitSig message is important for maintaining channel state synchronization. Verify that the condition l.channel.OweCommitment() is correctly determining when to send a commitment signature.
  • 2235-2242: The invocation of flushHooks after processing a RevokeAndAck message is crucial for the channel's state management. Verify that the hooks are invoked only when the channel state is clean and that the state is correctly updated after this operation.
  • 2541-2546: The invocation of outgoingCommitHooks after sending a CommitSig message is a critical operation. Verify that the hooks are invoked correctly and that the channel state is consistent after this operation.
  • 2990-2992: The error handling for AddMessage to the mailbox in HandleChannelUpdate should be verified. Ensure that the error does not indicate a deeper issue that could affect the stability of the channel link.
lnrpc/lightning.proto (2)
  • 2049-2053: The no_wait field has been added to the CloseChannelRequest message to allow for non-blocking channel closures. This change aligns with the PR objectives and user-provided facts.
  • 2060-2060: The InstantUpdate message has been added within the CloseStatusUpdate message. However, it is currently empty and does not contain any fields or documentation. This could be intentional for future extensions, but it's worth verifying to ensure it's not an oversight.
Verification successful

The InstantUpdate message is present in the lightning.proto file and has been incorporated into the generated Go and Swagger JSON code artifacts. However, it remains empty without any fields or documentation, which suggests that it could be a placeholder for future use or intended to be used as a trigger without carrying additional data. The original review comment is verified as accurate, and there is no evidence of an oversight.

Scripts Executed

The following scripts were executed for the analysis:


Script:

# Verify if there are any usages or planned usages for InstantUpdate message.
rg --type proto "InstantUpdate"

Length of output: 61



Script:

#!/bin/bash
# Search for any occurrences of "InstantUpdate" in the codebase.
rg "InstantUpdate"

Length of output: 1278

htlcswitch/link_test.go (1)
  • 7000-7025: The test TestLinkFlushApiGateStateIdempotence checks for idempotence by asserting that successive calls without state change result in an error. This is a good practice to catch unnecessary calls that may indicate bugs.

The test logic is sound and follows good testing practices.

rpcserver.go (2)
  • 2646-2649: The logic to check for active HTLCs before closing a channel has been updated to consider the NoWait flag. This change aligns with the PR's objective to allow channel closure without waiting for HTLCs to settle when NoWait is true.
  • 2694-2705: The addition of a conditional block to handle the NoWait flag during channel closure is implemented correctly. It sends an instant update if the user does not want to wait for the transaction ID, which is consistent with the PR's description.
lnwallet/channel.go (2)
  • 5469-5478: The implementation of NeedCommitment correctly acquires a read lock and defers its release, ensuring thread safety when accessing shared resources.
  • 5477-5477: Verify that the false argument passed to oweCommitment correctly checks for outstanding remote updates and updates in the remote commit transaction that are not yet reflected in the local commit transaction, as described in the method comment.

Comment on lines +909 to +912
func (f *mockChannelLink) EnableAdds(linkDirection LinkDirection) error {
// TODO(proofofkeags): Implement
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EnableAdds method is marked with a TODO comment and not implemented.

Would you like me to implement this method or should we create a GitHub issue to track this task?

Comment on lines +913 to +916
func (f *mockChannelLink) DisableAdds(linkDirection LinkDirection) error {
// TODO(proofofkeags): Implement
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DisableAdds method is marked with a TODO comment and not implemented.

Would you like me to implement this method or should we create a GitHub issue to track this task?

Comment on lines +917 to +920
func (f *mockChannelLink) IsFlushing(linkDirection LinkDirection) bool {
// TODO(proofofkeags): Implement
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsFlushing method is marked with a TODO comment and returns a hardcoded false.

Would you like me to implement this method or should we create a GitHub issue to track this task?

Comment on lines +921 to +923
func (f *mockChannelLink) OnFlushedOnce(func()) {
// TODO(proofofkeags): Implement
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OnFlushedOnce method is marked with a TODO comment and not implemented.

Would you like me to implement this method or should we create a GitHub issue to track this task?

Comment on lines +924 to +926
func (f *mockChannelLink) OnCommitOnce(LinkDirection, func()) {
// TODO(proofofkeags): Implement
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OnCommitOnce method is marked with a TODO comment and not implemented.

Would you like me to implement this method or should we create a GitHub issue to track this task?

Comment on lines +2980 to +3001
link := p.fetchLinkFromKeyAndCid(chanID)
if link == nil {
// If the link is nil then it means it was already
// removed from the switch or it never existed in the
// first place. The latter case is handled at the
// beginning of this function, so in the case where it
// has already been removed, we can skip adding the
// commit hook to queue a Shutdown message.
p.log.Warnf("link not found during attempted closure: "+
"%v", chanID)
return
}

link.OnCommitOnce(htlcswitch.Outgoing, func() {
err := link.DisableAdds(htlcswitch.Outgoing)
if err != nil {
p.log.Warnf("outgoing link adds already "+
"disabled: %v", link.ChanID())
}

p.queueMsg(shutdownMsg, nil)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to disable outgoing link adds and queue the Shutdown message is duplicated. This should be refactored into a common method to avoid code duplication and potential inconsistencies in behavior.

- link.OnCommitOnce(htlcswitch.Outgoing, func() {
-   err := link.DisableAdds(htlcswitch.Outgoing)
-   if err != nil {
-     p.log.Warnf("outgoing link adds already disabled: %v", link.ChanID())
-   }
-   p.queueMsg(shutdownMsg, nil)
- })
+ p.disableOutgoingAndQueueShutdown(link, shutdownMsg)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
link := p.fetchLinkFromKeyAndCid(chanID)
if link == nil {
// If the link is nil then it means it was already
// removed from the switch or it never existed in the
// first place. The latter case is handled at the
// beginning of this function, so in the case where it
// has already been removed, we can skip adding the
// commit hook to queue a Shutdown message.
p.log.Warnf("link not found during attempted closure: "+
"%v", chanID)
return
}
link.OnCommitOnce(htlcswitch.Outgoing, func() {
err := link.DisableAdds(htlcswitch.Outgoing)
if err != nil {
p.log.Warnf("outgoing link adds already "+
"disabled: %v", link.ChanID())
}
p.queueMsg(shutdownMsg, nil)
})
link := p.fetchLinkFromKeyAndCid(chanID)
if link == nil {
// If the link is nil then it means it was already
// removed from the switch or it never existed in the
// first place. The latter case is handled at the
// beginning of this function, so in the case where it
// has already been removed, we can skip adding the
// commit hook to queue a Shutdown message.
p.log.Warnf("link not found during attempted closure: "+
"%v", chanID)
return
}
p.disableOutgoingAndQueueShutdown(link, shutdownMsg)

peer/brontide.go Show resolved Hide resolved
peer/brontide.go Show resolved Hide resolved
}

// If we haven't finished close negotiations, then we'll continue as we
// can't yet finalize the closure.
if !closeFin {
if _, err := chanCloser.ClosingTx(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic in the default case of the switch statement is too aggressive and could cause the entire peer to crash. It would be better to log an error and continue operation.

- default:
-   panic("impossible closeMsg type")
+ default:
+   p.log.Errorf("received unknown close message type: %v", typed)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if _, err := chanCloser.ClosingTx(); err != nil {
if _, err := chanCloser.ClosingTx(); err != nil {

Comment on lines +2069 to +2071
message InstantUpdate {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InstantUpdate message is defined but not documented or used within the provided context. It's important to ensure that this addition is intentional and that there is a plan for its implementation.

Would you like assistance in defining the purpose and fields for InstantUpdate, or should this be tracked in a separate issue for future implementation?

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itests are failing, aside from that, I think it's good to go once the rpc level behavior is settled. My suggestion is to defer the NoWait changes and itests to a new PR, and merge the rest to limit the scope. There we can then handle this case properly,

Main thing to be mindful of is that if we just block w/o any updates, users might think it's stuck, and try to issue the command again, so the new logic should reject as we already have a pending close active. Eventually we might also want to add a new state here we show for the channel re closing pending, channel flushign, etc. Again I think we can defer that to limit to the scope of this PR.

rpcserver.go Outdated Show resolved Hide resolved
@ProofOfKeags ProofOfKeags force-pushed the bugfix/htlc-flush-shutdown branch from 3439385 to 69ef1b2 Compare January 23, 2024 22:41
@Roasbeef
Copy link
Member

My suggestion is to defer the NoWait changes and itests to a new PR, and merge the rest to limit the scope.

I think that NoWait flag, even if it's temp here is useful, as it's what enables the itest in the first place. The test should stay structurally the same later, but we'd use a diff flag to opt into the new async behavior.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦎

// active htlcs` or `link failed to shutdown` if we close the channel.
// We need to investigate the order of settling the payments and
// updating commitments to understand and fix .
time.Sleep(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Roasbeef Roasbeef merged commit 1caca81 into lightningnetwork:master Jan 24, 2024
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants